In this project, I will analyze data from the New York City school department to understand whether parent, teacher and student perceptions of the following factors affect average school SAT scores(an indicator of academic performance):
I’ll start by installing the packages needed for this analysis
library(readr)
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
4: In readChar(file, size, TRUE) : truncating string with embedded nuls
5: In readChar(file, size, TRUE) : truncating string with embedded nuls
6: In readChar(file, size, TRUE) : truncating string with embedded nuls
library(dplyr)
library(stringr)
library(purrr)
library(tidyr)
library(ggplot2)
library(readxl)
In this project, we’ll be investigating the following questions:
1. Do student, teacher, and parent perceptions of NYC school quality appear to be related to demographic and academic success metrics? 2. Do students, teachers, and parents have similar perceptions of NYC school quality?
Importing Data
survey_dict <- read_xls("Survey Data Dictionary.xls")
New names:
* `` -> ...2
survey_dict
Survey Data Dictionary.xls, contains metadata that will be useful to decide how to clean and prepare the survey data for analysis.
survey_data_gen <- read_tsv("masterfile11_gened_final.txt")
Parsed with column specification:
cols(
.default = col_double(),
dbn = col_character(),
bn = col_character(),
schoolname = col_character(),
studentssurveyed = col_character(),
schooltype = col_character(),
p_q1 = col_logical(),
p_q3d = col_logical(),
p_q9 = col_logical(),
p_q10 = col_logical(),
p_q12aa = col_logical(),
p_q12ab = col_logical(),
p_q12ac = col_logical(),
p_q12ad = col_logical(),
p_q12ba = col_logical(),
p_q12bb = col_logical(),
p_q12bc = col_logical(),
p_q12bd = col_logical(),
t_q6m = col_logical(),
t_q9 = col_logical(),
t_q10a = col_logical()
# ... with 18 more columns
)
See spec(...) for full column specifications.
survey_data_gen